Introduction to Spatial Analysis
Day 1 -Concepts and Datasets
Jonathan Phillips
January, 2019
Geography
- What is your favourite sport?
- Do you speak Spanish?
- Do you know who Fofão is?
- How many kisses on the cheek do you greet someone with?
- If you are on your own in a taxi do you sit in the front or back?
- Do you think government policy should allow free migration?
- Where do you live?
Geography
Knowledge and communication depend on where we live
Social norms and customs depend on where we live
Political preferences depend on where we live
Geography
- Tobler’s First Law of Geography:
“Everything is related to everything else, but near things are more related than distant things”
Geography
What does ‘near’ mean?
- Concepts of distance:
- Euclidean
- Great Circle
- Manhattan
- Levensthtein
- Mahalanobis
- Driving
- Network
- Minimum-cost
- Genetics
Geography
- What does ‘related’ mean?
- Correlated
- More similar
- More different (ex. dialing codes to avoid typing errors)
- ‘Related’ does not mean one person ‘causes’ a similar effect on another
- It may just be a common response to a similar environment
Geography
- [Map of spatial autocorr]
Geography

Geography
- But isn’t the world getting smaller?
- ‘The death of distance’
- Everything is ‘near’ on the internet
- Relevant distances may be changing
- Cost of flights instead of kilometres or hours
- Language and social network instead of proximity to radio tower
- Spatial relationships take place at multiple scales
- I am Welsh, British, European etc.
- The similarities between rural China and rural Russia are greater than the differences
Geography
- Lots of interesting questions are really non-spatial
- We can draw maps of them
- But the conclusion does not depend on the locations of the units
| Which state in Brazil is richest? (DF) |
Where in Brazil are states richest? (Southeast) |
| How many countries have had cases of ebola? (11) |
Which part of Africa was affected by ebola (West and Central)? |
| What is the population of the USA? (~325m) |
How many people live West of the Mississippi? (~136m) |
Geography
- Physical features also affect social and political processes
- Attracting economic activity
- Preventing interactions
Geography

Geography

Geography

Merits of Spatial Analysis
Opportunities:
- Deeper explanations for common outcomes
- Avoid confounding relationships
- Enabling new inferential methodologies
Limitations:
- Data are not ‘independent’ for statistical analysis
- Data are often aggregated, and the level of aggregation affects our conclusions (Modifiable Areal Unit Problem, Ecological Fallacy)
Map Literacy
- Maps are clear and convincing
- Patterns may only be visible when arranged spatially
- If you have spatial data, why put it in a table or a chart?
Map Literacy
% latex table generated in R 3.5.1 by xtable 1.8-2 package % Sat Jan 05 18:59:33 2019
Map Literacy

Map Literacy

Map Literacy
- But maps still require careful interpretation
Map Literacy
- Scale
- Can I walk from The Art Institute of Chicago to Union Station in 10 minutes?

Map Literacy
- Scale
- Can I walk from The Art Institute of Chicago to Union Station in 10 minutes?

Map Literacy
- Compass
- What’s the best place to view the sunset in the Wirral (UK)?

Map Literacy
- Compass
- What’s the best place to view the sunset in the Wirral (UK)?

Map Literacy
- Legend
- Can be manipulated to convey relevant (or misleading!) conclusions

Map Literacy

Vector vs. Raster Data
- Vector
- Start with a blank page
- Add specific objects (points, lines, polygons) defined by coordinates (x,y)
- The computer stores just the coordinates of the objects
- Non-spatial ‘Attributes’ of each object allow complex analyses
- Raster
- Start with a grid
- Each grid square (pixel) has a value
- The computer stores one value for every grid square (fixed memory size)
- Mostly for ‘continuous’ remote sensing (satellite) images
Vector vs. Raster Data

Types of Vector Data
- An analysis choice, and depends on scale
Types of Vector Data
- The attributes we assign to vector objects also vary

Locations in Space
- Longitude = Angle from equator (N/S)
- Latitude = Angle from Greenwich, London (E/W)

Locations in Space
- Longitude & Latitude can be measured in different units
- DMS: 49°30’00″N, 123°30’00″W
- DM: 49°30.0′, -123°30.0’
- Decimal Degrees: 49.5000°,-123.5000°
- But all of these use the same Geographic Coordinate System
- And we ‘always’ use the same one
- WGS-84
Locations in Space

- This oblate spheroid is estimated by a ‘datum’ so we get the location correct
- No need to worry about this, WGS-84 includes its own datum
Locations in Space
- But we view maps on flat surfaces: paper or screens
- To produce flat maps we need a Projected Coordinate Reference System
- Translating 3-D locations to 2-D locations
- There are many different ways to do this, just as there are many ways to peel an orange
Locations in Space
- Projections can preserve shape, area or distance, but not all three!

Locations in Space

Locations in Space
Coordinate Reference Systems have useful shortcut EPSG codes - In R, this is all you need
| WGS-84 |
Geographic |
4326 |
| Corrego Alegre / UTM zone 23S (Coastal Brazil) |
Projected |
22523 |
| Chua / UTM zone 23S (Distrito Federal) |
Projected |
4071 |
Locations in Space
- Which Coordinate Reference System (CRS) should I use?
- Important: You don’t choose - your data sources already come with a specific CRS
- Important: ALL data in the analysis must use the same CRS
- That means sometimes we have to transform from one coordinate system to another
- For projections, do you want to convey shape, area or distance accurately?
Georeferencing
- With a CRS, computers understand locations such as -23.562778, -46.725261
- But what if we have a street address?
Spatial Datasets

Spatial Datasets
- Vector Spatial Datasets
- Coordinates for every object
- Multiple coordinates for lines, polygons
| 001 |
Minas Gerais |
-48.77246, -17.773988 |
| 002 |
Rio de Janeiro |
-49.24686, -16.819800 |
Spatial Datasets
- Vector Spatial Datasets
- Coordinates for every object
- Multiple coordinates for lines, polygons
| 001 |
Minas Gerais |
MULTIPOLYGON ((( -48.77246 -17.773988, -48.77252 -17.773970, -48.77266 -17.773990))) |
| 002 |
Rio de Janeiro |
MULTIPOLYGON ((( -49.24686 -16.819800, -49.24701 -16.819812, -49.24707 -16.819838))) |
Spatial Datasets
- One single ‘Multipolygon’ can be complicated
- Comprised of many distinct polygons
- Polygons can have ‘holes’ in them

Spatial Datasets
- Raster Spatial Datasets
- Coordinates for every data point
| -106.05 |
35.96 |
0 |
| -106.06 |
35.96 |
13 |
| -105.07 |
35.96 |
2 |
| -105.08 |
35.96 |
0 |
Spatial Datasets
- Historically, vector data has been stored as shapefiles
- Shapefiles separate out the tables, location data, projection into separate files
| Data.shp |
Geometry details |
| Data.dbf |
Non-spatial attribute data (a table) |
| Data.shx |
Indexing of the geometry to match the table |
| Data.prj |
Details of the projection |
Spatial Datasets
- Raster data is typically stored as .tiff files
- The same as you get from a camera or scanner
- But with location and projection data so that we know ‘where’ the image corresponds to
Non-Spatial Joins
- Most of our data is non-spatial, but could be made spatial
- Election results
- Death rates
- Welfare payments
- Conflict
- We can make this data spatial if we link it to existing spatial (location) data
- Using common identifiers in both datasets
- Non-spatial joins
Non-Spatial Joins
- Governments publish school performance data
- But what is the spatial pattern of school performance?
- Better in the city centre or in the suburbs?
- We need a source for the location of the schools
- Perhaps from a separate geographical survey
- Or by georeferencing their addresses
- How do we combine the school performance and location datasets?
Non-Spatial Joins

temp
- examples of types of spatial analysis